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' Stochastic encoders for channel coding and lossy source coding are introduced with a rate 



close to the fundamental limits, where the only restriction is that the channel input alphabet and the 
reproduction alphabet of the lossy source code are finite. Random numbers, which satisfy a condition 
specified by a function and its value, are used to construct stochastic encoders. The proof of the 
theorems is based on the hash property of an ensemble of functions, where the results are extended 
to general channels/sources and alternative formulas are introduced for channel capacity and the 
rate-distortion region. Since an ensemble of sparse matrices has a hash property, we can construct 
a code by using sparse matrices, where the sum-product algorithm can be used for encoding and 
l/^ ' decoding by assuming that channels/sources are memoryless. 
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I. Introduction 

The aim of this paper is to introduce a channel code and a lossy source code for general 
channels/sources including additive Gaussian, Markov, and non-stationary channels/sources. 
The only assumption is that the input alphabet for channel coding and the reproduction 
alphabet for lossy source coding are finite. We prove that the fundamental limits called 
the channel capacity and the boundary of the rate-distortion region are achievable with the 
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proposed codes. We introduce stochastic encoders for constructing the codes and we can 
easily modify these encoders to make them deterministic. Let be the cartesian power of a 
set X, and x denotes an element of X"-. To construct stochastic encoders, we use a sequence 
of random numbers subject to a distribution Jl on defined as 



for a given probability distribution n on X^, a function A : X^^ — )■ {Ax : x e X^}, and 
c e {Ax : X e X^}. Let us call a generator for this type of random number a constrained 
random number generator. 

One contribution of this paper is to extend the results of [21] to general channels/sources. 
In [21], the direct part of the channel coding theorem and the lossy source coding theorem for 
a discrete stationary memoryless channel/source are shown based on the hash property of an 
ensemble of functions, which is an extension of random bin coding [4], the set of all linear 
functions [6], and the two-universal class of hash functions [9]. In this paper, alternative 
general formulas for the channel capacity and rate-distortion region are introduced and the 
achievability of the proposed codes is proved based on a stronger version of hash property 
introduced in [22] [23] [24] [25]. Since an ensemble of sparse matrices has a hash property, 
we can construct codes by using sparse matrices. 

Another contribution of this paper is that we introduce a practical algorithm for the 
proposed code for a (non- stationary) memoryless (asymmetric) channel/source. We introduce 
an practical algorithm for a constrained random generator by using a sparse matrix and a 
sum-product algorithm [1][17], where we assume that a channel/source is (non-stationary) 
memoryless. There are many ways to construct channel codes [3][11][18] and lossy source 
codes [12] [26] [19] [32] by using sparse matrices. These approaches assume that a chan- 
nel/source is stationary memoryless and symmetric, or a quantization map [10, Section 6.2] 
is used for an asymmetric channel/source. On the other hand, the only requirement for the 
proposed code is that the input alphabet for channel coding and the reproduction alphabet 
for lossy source coding are finite. 

It should be noted that a similar idea has appeared in [28] [33], where they introduced 
random bin coding (privacy amplification) and Slepian-Wolf decoding^ (information recon- 
ciliation) for the construction of codes, and their proofs are based on the fact that the output 

'it should be noted that the idea of using Slepian-Wolf decoding has already been mentioned in [20][21]. 
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Statistics of random binning are uniformly distributed. Furthermore, the encoding functions 
seem to be impractical. This paper describes the explicit practical construction of encoding 
functions and theorems are proved simply and rigorously based on the technique reported in 
[24], where it is proved that we can use sparse matrices for the construction of codes. 

This paper is organized as follows. Section II reviews formulas for the channel capac- 
ity and the rate-distortion region based on the information spectrum method introduced 
in [13][14][31]. Alternative formulas for the channel capacity and the rate-distortion region 
are also introduced. Section III describes the notion of a hash property, which is stronger 
than that introduced in [21]. Several lemmas are introduced that will be used in the proof 
of the theorems. Section IV deals with the construction of a channel code and Section V 
describes the construction of a lossy code. Section VI describes an algorithm for a constrained 
random number generator by using a sum-product algorithm. The conversion from stochastic 
encoders into deterministic encoders is discussed in this section. Theorems and lemmas are 
proved in Section VII. Some lemmas are shown in Appendix. 

II. Formal Description of Problems and General Formulas for Channel 



This section provides a formal description of the problems and reviews formulas for the 
channel capacity and the rate distortion region. All the results in this paper are presented 
by using the information spectrum method introduced in [13][14][31], where the consistency 
and stationarity of channels/sources are not assumed. It should be noted that all the results 
reported in this paper can be applied to stationary ergodic channels/sources and stationary 
memoryless channels/sources. 

Throughout this paper, we denote the probability of an event by P( ) and denote the 
probability distribution of a random variable [/by nu- 

We call a sequence U = {C/"}^i of random variables a general source, where C/" e W". 
For a general source U, we define the spectral sup-entropy rate H{U) and the spectral 
inf-entropy rate H{U) as 



Capacity and Rate Distortion Region 
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It is known that both H{U) and H(U) are equal to the entropy rate of U when U is 
stationary ergodic, that is, 

H(U) = H(U) = lim il^, 

n— >-oo 77, 

where H{U^) is the entropy of [/". When [/ is stationary memoryless, both H{U) and 
H_{U) are equal to the entropy H{U^). 

For a pair (I/, V) — {(C/", of general sources, we define the spectral conditional 

sup-entropy rate H{U\V), the spectral conditional inf -entropy rate H_{U\V), the spectral 
sup-mutual information rate I{U; V), and the spectral inf-mutual information rate /(C/; V) 
as 

H{U\V) = inf \e : lim P (- log i— — > = 

V) = sup le : lim P (- log ^— — < ^"j = 

7{U; V)^ini\e:]imp(- log ^^3^"'^^ > ^"l = 

n-»-oo IIifn[U"-)lIv^{y ) J 

where HlJnyn IS the joint probability distribution corresponding to ([/", F"). It is known that 
both H(U\V) and H_{U\V) are equal to the conditional entropy rate of U given V, and 
both /([/; V) and /(t/; V) are equal to the mutual information rate between U and V, 
when ([/, V) is stationary ergodic, that is, 

H(U\V) = H(U\V) = lim ^ 

n— >-oo 77, 

7(C7; V) = UU; V) = lim illl^^, 

ra— s-co 77, 

where is the conditional entropy of U"^ given and V") is the mutual 

information between Lf^ and V^. When (L/, V) is stationary memoryless, both H{U\V) and 
H(U\V) are equal to the conditional entropy H(U^\V^) and both 7(U;V) and /(C/; V) 
are equal to the mutual information I{U^;V^). 

A. Channel Capacity 

In the following, we introduce the definition of the channel capacity for a general channel. 
Let and be the alphabets of a channel input X"- and a channel output F", respectively. 
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A sequence W = of conditional probability distributions is called a general 

channel. 

Definition 1: For a general channel W, we call a rate R achievable if for all 6 > and all 
sufficiently large n there is a pair consisting of an encoder cpn : M.n and a decoder 

V'n : ^ such that 

-\og\Mn\ >R 

n 

PiMy") ^ M„) < s, 

where [1/n] log |A^„| represents the rate of the code, M„ is a random variable of the message 
corresponding to the uniform distribution on A4„ and the joint distribution /XM„y" is given 
as 

/^y"|X"(?/|v'n(m)) 
t,M.y.{m,y)^ . 

The channel capacity C(W) is defined by the supremum of the achievable rate. 
For a general channel W, the channel capacity C{W) is derived in [31] as 

C{W)= sup L{X;Y), (1) 

X 

where the supremum is taken over all general sources X = {X"^}'^=i and the joint distribution 
fjLx^Y" is given as 

Hx"Y"{x,y) = HY"\xr^{y\x)nx"{x). (2) 

We introduce the following lemma, which will be proved in Section VII-A. It should 
be noted that this capacity formula is a straightforward generalization of that obtained by 
Shannon [30]. 

Lemma 1: For a general channel W, 

C(W) = sup [H{X) - H{X\Y)\ , (3) 

where the supremum is taken over all general sources X and the joint distribution of {X,Y) 
is given as (2). 

Remark 1: When W is stationary ergodic, it is sufficient that the supremum on the right 
hand side of (1) and (3) is taken over all stationary ergodic sources. When W is stationary 
memoryless, it is sufficient that the supremum on the right hand side of (1) and (3) is taken 
over all stationary memoryless sources. For these reasons, the lemma is trivial in these cases. 
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In this paper, we construct a channel code whose rate is close to the channel capacity given 
by (3). Constructed code is given by a pair consisting of a stochastic encoder : M.n — > 
and a decoder ^„ : — )> Mn, where Mn is a set of messages. 

Remark 2: It should be noted that the capacity formulas (1) and (3) are satisfied when a 
stochastic encoder is allowed. In fact, by considering the average over stochastic encoders 
and using the random coding argument, we can construct a deterministic encoder from a 
stochastic encoder. Thus the rate of the stochastic encoder should be upper bounded by the 
channel capacity. On the other hand, the channel capacity is achievable with a stochastic 
encoder because a deterministic encoder is one type of stochastic encoder. 

B. Rate-Distortion Region 

In the following, we introduce the achievable rate-distortion region for a general source. 
Let y"' be a source alphabet and A"" be a reproduction alphabet^. Let dn ■ X" x — > [0, oo) 
be a distortion function. 

Definition 2 ([14, Defi 5.3.1]): We call a pair (R, D) consisting of a rate R and a distortion 
D achievable if for all 5 > and all sufficiently large n there is a pair consisting of an encoder 
'■ ^ Mn and a decoder : Mn^y^ such that 

-\og\Mn\<R (4) 
n 

p idniMMy'')),y") >D)<5. (5) 

The achievable rate-distortion region Tl{Y) is defined by the set of all achievable pairs 
{R,D). 

Remark 3: It should be noted that the factor 1/n appears in [14, Def. 5.3.1]. This difference 
is not essential because we can replace d„ by throughout this paper. 

For a pair (X, V) of general sources, let D{X, Y) be defined as 

Y) = inf \e : lim P (ci„(X", F'*) > ^) = o| . 

should be noted that the roles of X" and y" are the reverse of those in the conventional definition of the rate-distortion 
theory. 
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For a general source Y, the rate-distortion region 'JZ{Y) is derived in [27] [14, Theorem 
SA.lf as 

f 7{X;Y)<r] 
n{Y) = [j{{R,D): _ \, (6) 

w ( D{X;Y) < D) 

where the union is taken over all general channels W = and the joint distri- 

bution //x"Y" is given as 

IJ.x'-Yn{x,y) = iixr^\Y^{x\y)iJiYn{y). (7) 

We introduce the following lemma, which is proved in Section VII-B. 

Lemma 2: For a general source Y, 

H{X) - H{X\Y) < r\ 
{R,D): _ \, (8) 

D{X;Y)<D J 

where the union is taken over all channels W and the joint distribution of {X,Y) is given 
as (7). 

Remark 4: When X is stationary ergodic, it is sufficient that the union on the right hand 
side of (6) and (8) is taken over all stationary ergodic channels. When X is stationary 
memoryless, it is sufficient that the union on the right hand side of (6) and (8) is taken over 
all stationary memoryless channels. For these reasons, the lemma is trivial in these cases. 

In this paper, we construct a fixed-rate lossy source code, where {R, D) is close to the 
boundary of the region given by the right hand side of (8). Constructed code is given by a 
pair consisting of a stochastic encoder : — > M.n and a decoder ^„ : M.n — > 3^", where 
Al„ is a set of codewords. 

Remark 5: Similarly to Remark 2, formulas (6) and (8) are satisfied when a stochastic 
encoder is allowed. In fact, by considering the average over the stochastic encoders and using 
the random coding argument, we can construct a deterministic encoder from a stochastic 
encoder without any loss of encoding rate. Thus the rate-distortion pair of the stochastic 
encoder should be in the rate-distortion region. On the other hand, the rate-distortion region is 
achievable with a stochastic encoder because a deterministic encoder is one type of stochastic 
encoder. 

^The rate-distortion function, which is the infimum of R such that {R, D) is achievable for a given D, is derived in 
[27][14, Theorem 5.4.1]. 



n{Y) = U 



w 
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Remark 6: It should also be noted that we have similar results that are obtained in this 
paper by assuming 

(imax = max dn{y, x) < oo, 

n,x,y 

where (5) is replaced by the average distortion criterion 

III. (a, /3) -HASH PROPERTY 

In this section, we introduce the hash property"^ introduced in [22] [23] [24] [25] and its 
implications. 

Throughout this paper, we use the following definitions and notations. The set W \ V = 
U nV^ denotes the set difference. Let Au denote a value taken by a function A : U" — )> U 
at ti e U"", where is the domain of A and U is the region of A. It should be noted that 
A may be nonlinear. When A is a linear function expressed by an Z x n matrix, we assume 
that U = GF(g') is a finite field and the range of functions is UK For a set A of functions, 
let ImA and Im^ be defined as 

luiA = {Au :ueU''} 

1mA = [J ImA. 

AeA 

We define a set Ca(c) and Cas(c, m) as 

Cyi(c) = {u : Au = c} 
Cab{c, m) = {u : Au — c, Au — m}. 

In the context of linear codes, Ca{c) is called a coset determined by c. The random variables 
of a function A and a vector c e ImA are denoted by the sans serif letters A and c, 
respectively. It should be noted that the random variable of a n-dimensional vector w e 
is denoted by the Roman letter that does not represent a function, which is the way it 
has been used so far. 

''in [22][23][24][25], it is called the 'strong hash property.' Throughout this paper, we call it simply the 'hash property.' 
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A. Formal Definition and Basic Properties 

Here, we introduce the hash property for an ensemble of functions. It requires stronger 
conditions than those introduced in [21]. 

Definition 3: Let A. = {An}^=i be a sequence of sets such that An is a set of functions 
A : — > ImAn- For a probability distribution pA,n on An, we call a sequence {A., Pp) = 
{(«4.n,PA,n)}^i an ensemble. Then, (A, Pa) has an [ol/^, (3 p)-hash property if there are two 
sequences aA = {a^{n)}'^^^ and /3a = {/3A(n)}^=i, depending on {pA,n}^=i, such that 

lim Q;A(n) = 1 (HI) 

n— >-oo 

lim /5a (n) = (H2) 

n->oo 

and 

PA,n {{A : Au - Aw'}) < /3a (n) (H3) 

u'eU^\{u}:p^,MA:Au=Au'})>^^ 

for any n and u eU"^. Throughout this paper, we omit the dependence of A, Pa, oia and Pa 
on n. 

Remark 7: In [21][23], an ensemble is required to satisfy the condition 

um — log — — — = 0, 
n->oo n I lm^„ I 

where W„ is the range of functions. This condition is omitted because it is unnecessary for 
the results reported in this paper. 

Let us remark on the condition (H3). This condition requires the sum of the collision 
probabilities pa {{A : Au — Au'}), which is greater than Q;^/|Im^|, to be bounded by Pa, 
where the sum is taken over all u' except u. An intuitive interpretation of (H3) will be 
provided in Section III-B by using an ensemble of sparse matrices. It should be noted that 
this condition implies 

J2 PA ({A : Au = Au'}) < \rnr\ + ^-Vj^^ + niin{|r|, \r\}/3A (H3') 
u'eV 

for any T, T' C W", which is introduced in [21]. A stronger condition (H3) is required for 
Lemmas 3 and 5, which appear later. The proof of (H3') is given in Appendix B for the 
completeness of this paper. 

It should be noted that when ^ is a two-universal class of hash functions [9] and pa is the 
uniform distribution on A, then (A, Pa) has a (1,0) -hash property, where 1 and denote 
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the constant sequences of 1 and 0, respectively. Random bin coding [4] and the set of all 
linear functions [6] are examples of the two-universal class of hash functions. An ensemble 
of sparse matrices satisfying a hash property is introduced in Section III-B. 

We have the following lemma, where it is unnecessary to assume the linearity of functions 
assumed in [21]. The proof is given in Appendix C for the completeness of this paper. 

Lemma 3 ([22, Lemma 4]): Let (A, Pa) (B,Pb) be ensembles satisfying an (a^, /3a)' 
hash property and an (cxb, (3Q)-hash property, respectively. Let A E A. (resp. B e B) be a 
set of functions A :U"' ^ 1mA (resp. B -.W ^ ImB). Let (A, 5) e ^ x B be a function 
defined as 

{A, B)u = {Au, Bu) for each u e W. 
Let pab be a joint distribution on ^ x B defined as 

Pab(^, B) = p/^{A)pb{B) for each {A, B)eAxB. 

Then the ensemble {A. x B,Pab) has an (aAB, /3AB)'hash property, where (q;ab,/5ab) is 
defined as 



The following lemma is related to the collision-resistance property, that is, if the number 
of bins is greater than the number of items then there is an assignment such that every bin 
contains at most one item. The proof is given in Appendix D for the completeness of this 
paper. 

Lemma 4 ([21, Lemma 1]): If {A,Pa) satisfies (H3'), then 



for all ^ C and w e W". 

We show the collision-resistance property from Lemma 4. Let /lu be the probability distri- 
bution on ^ C W". We have 



Ea \Mu {{u : [g \ {n}] nCA(An) 0})] < Y,Mu)pa {{A : [G \ {u}]nCA{Au) ^ 0}) 



ckab = Q;aQ;b 



/3ab = /3a + /5b- 



Pa{{A:\Q\{u}\^Ca{Au)^%-)) < 



\lmA\ 




< 



\g\aA 
\lmA\ 



(9) 
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By assuming that |^|/|Im^| vanishes as n — > oo, we have the fact that there is a function 
A such that 

f,u{{u:[g\{u}]nCA{Au)y^iD})<d 

for any S > and sufficiently large n. Since the relation [Q \ {u}] r\CA{Au) ^ corresponds 
to an event where there is w' e ^ such that u and w' are different members of the same bin 

(they have the same codeword determined by A), we have the fact that the members of Q 
are located in different bins (the members of Q can be decoded correctly) with probability 
close to one. 

The following lemma is related to the balanced coloring property, which is analogous to 
[2, Lemma 3.1][7, Lemma 17.3]. This lemma implies that there is a function A such that T 
is almost equally partitioned by A with respect to a measure Q. We use this property instead 
of the saturation property [21], that is, if the number of bins is greater than the number of 
items there is an assignment such that every bin contains at least one item. The proof is 
given in Appendix E for the completeness of this paper. 

Lemma 5 ([24, Lemma 4]): If {A,Pa) satisfies (H3), then 

Q{rnCA{c)) 1 



[Pa + 1] \lmA\ max^gT- Q{u) 



(10) 



Q{T) \lmA\ 
for any function Q : — )■ [0, oo) and T C W", where 

g(r)^5]g(u). 

ueT 

Remark 8: In [2, Lemma 3.1]^ and [7, Lemma 17.3], the absolute value on the left hand 
side of (10) is upper-bounded by £/|Im^| for all c e Im^ and Q E Q provided that £^ > 
3|Im^| log(2|Im^||Q|) max^^-j-Qi^)^ where Q is a finite set of probability distributions. 

We show the balanced coloring property. From Lemma 5, we have the fact that there is a 
function A such that 

QimcAic)) 1 



QiT) 



\lmA\ 



< 



'cka - 1 + 



[/3a + l]|Im^| max„g7-Q(tt) 

W) ■ 



By assuming that Q{7^ < 1 and |Im^| maxug7-(5(u) vanishes as n — > oo, we have 



Q{rnCA{c))- 



QiT) 



\lmA\ 



QimcAic))- 



QiT) 



\lmA\ 



'See also [8, Remark on Lemma B.l]. 
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g(rnCA(c)) 



QiT) |W| 

< . /q;a - 1 + [/5a + l]|Im^| maxQ(u) 
V ueT 

<5 (11) 

for all c G Im^, 5 > 0, and sufficiently large n. Since {T fl CA(c)}ceimy4 is a partition of 
T, we have the fact that the set T is almost equally partitioned with respect to a measure 
Q, where c represents the color of a set TnCA(c). 

fi. Hash Property for Ensembles of Matrices 

In the following, we discuss the hash property for an ensemble of matrices. 

In the last section we discussed that the uniform distribution on the set of all linear 
functions has a strong (1, 0)-hash property because it is a universal class of hash functions. 
In the following, we introduce another ensemble of matrices. 

First, we introduce the average spectrum of an ensemble of matrices given in [3]. Let U 
be a finite field and ^ be a set of linear functions A : ^ U'' . li should be noted again 
that A can be expressed by an Z x n matrix. 

Let t{u) be the type^ of u e W", which is characterized by the empirical probability 
distribution of the sequence u. Let H he a set of all types of length n except t(0), where 
is the zero vector. For a probability distribution on a set of / x n matrices and a type t, 
let S{pa, t) be defined as 

5(Pa, t) = J2 Pa{A)\{u e : Aw = 0, t{u) = t}|, 

AeA 

which is called the expected number of codewords that have type t in the context of linear 
codes. For a given ?^a C H, we define q;a(?T') and /3A{n) as 

|W| S{pA,t) 
"A W = -fTTTT ■ max (12) 

^aW= Y1 '^(^'a,*), (13) 

where p^ denotes the uniform distribution on the set of all I x n matrices. 

The following lemma provides a sufficient condition for an ensemble of matrices to satisfy 
a strong hash property. The proof is given in Appendix F for the completeness of this paper. 

*In [21], it is called a histogram that is characterized by the number of occurrences of each symbol in the sequence u. 
The type and the histogram are essentially the same when n is fixed. 
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Lemma 6 ([22, Theorem 1]): Let (A, Pa) be an ensemble of matrices and assume that 
Pa {{A : Au = 0}) depends on u only through the type t{u). If (q;a,/3a)' defined by (12) 
and (13), satisfies (HI) and (H2), then (A, p^^) has a strong {a. l3f^) -hash property. 

Next, we introduce the ensemble of g-ary sparse matrices introduced in [21], which is 
the g-ary extension of the ensemble proposed in [18]. Let U = GF{q) and I = nR when 
< < 1 is given, where g is a prime number or a power of a prime number. We generate 
aa I X n matrix A with the following procedure, where at most r random nonzero elements 
are introduced in every row. 

1) Start from an all-zero matrix. 

2) For each i G {l,...,n}, repeat the following procedure r times: 

a) Choose (j, a) e {1, ... ,1} x [GF(g) \ {0}] uniformly at random. 

b) Add^ a to the (j, i) -element of A. 

Assume that r — O(logn) is even and let (A,Pa) t>e an ensemble corresponding to the 
above procedure. Let "Ha C "H be a set of types satisfying the requirement that the weight 
(the number of occurrences of non-zero elements) is large enough. Let (q:a,/3a) be defined 
by (12) and (13). Then cka measures the difference between the ensemble (A,Pa) the 
ensemble of all I xn matrices with respect to the high-weight part of the average spectrum, 
and Pa provides the upper bound of the probability that the code {w e : Au — 0} has 
low-weight codewords. It is proved in [21, Theorem 2] that (aA,/3A) satisfy (HI) and (H2) 
if we adopt an appropriate "Ha- Then, from Lemma 6, we have the fact that this ensemble has 
a strong (a a, /3a) "hash property. It should be noted that the convergence speed of (q:a,/3a) 
depends on how fast r grows in relation to the block length. The analysis of {a a, /3a) is 
given in the proof of [21, Theorem 2]. 

IV. Construction of Channel Code 

This section introduces a channel code. The idea for the construction is drawn from [20] [21] [24]. 
It should be noted that we assume that the channel input alphabet A"" is a finite set but allow 
the channel output alphabet J^" to be an arbitrary (infinite, continuous) set. 

For given r > and R > 0, let {A.,Pa) and (B, pg) be ensembles of functions A : X"^ ^ 
ImA and B : A'" ImB satisfying 

r — — log llm^l 
n 

^It should be noted that {j, i)-element of the matrix is not overwritten by a when the same j is chosen again. 
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Fig. 1. Construction of Channel Code 



i? = — log |lm;B|, 

n 

respectively, where we define M.n = ImB and it! represents the rate of the code. We fix 
functions A e A, B e B, and a vector c e Im^ so that they are available for constructing 
an encoder and a decoder. 

We use a constraint random number generator to construct an encoder. Let X'^ = ^^^(c, m) 
be a random variable corresponding to the distribution 

— y^^y if a; e Cab (c,m), 

0, if cc ^ Cab (c,m), 

where //x" is the probability distribution of the channel input random variable X". 
We define the stochastic encoder : ImB — > A"" and the decoder t/^n '■ ^ Im-^ as 

$„(m)=X^B(c,m) (15) 

i^n{y) = BxA{c\y), (16) 

where we declare an encoding error when //x"(Cab(c, m)) = and xa is defined as 

X A{c\y) = aig max A*x»iy"(a;'|t/). (17) 
x'eCAic) 

The flow of vectors is illustrated in Fig. 1. 

Remark 9: It should be noted that (15) is different from the encoder defined in [21] whereas 
(16) is the same. In [21], the encoder is defined based on typical sets, where x e Tx\Y,2'y{y) 
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is satisfied when x e Tx,j and y e 7Y\x,'y{x). We changed the definition of the encoder 
because a general channel may not satisfy this property. 
The error probability Error(/l, B, c) is given by 



/ixn(CAs{c,m))=0 //j^n(CAB(c,m))>0 

a:eCAs(c,m) 



We have the following theorem, where the proof is given in Section VII-C. 

Theorem 1: If r, > satisfy 

r>H{X\Y) (19) 
r + R<H{X), (20) 

then for any 5 > and all sufficiently large n there are functions A & A, B & B, and a 
vector c e Im^ such that 

Error(yl, B, c) < 5. (21) 

The channel capacity is achievable with the proposed code by letting X be a source that 
attains the supremum on the right hand side of (3). 

Remark 10: From (18) and (21), we have the fact that CAB{c,m) ^ with probability 
close to 1 by letting 5^0 because 

y —< y — 

CAs(c,m)=0 /ix"{CAB(c,m))=0 

< Error(A, B, c) 

< S. (22) 

Furthermore, we can find c G ImA C Im.A satisfying (21) because Error(/l, B,c) = 1 when 
c e 1mA \ 1mA. 

Next, we consider a special case for the proposed code, which provides an interpretation 
of the conventional linear codes [3] [10]. It should be noted that a constrained random number 
generator is unnecessary. 

Let us assume that /xx" is the uniform distribution on A"" and pa) is an ensemble of 

matrices A : X" ^ satisfying 

r = — log llm^l 
n 
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when < r < log \X\ is given. We fix a matrix A e A and a vector c e ImA C Im^ so 
that they are available for constructing an encoder and a decoder. 

Let Mn be a set of all messages that is a linear space satisfying \M.n\ = |Ca(c)| for all 
c G ImA. Since ^4 is a linear function, there is a bijective linear function G : Mn Ca{0), 
which is known as a generator matrix. The rate R of the code is given as 

R = - log\Mn\- 
n 

Since for a given c e Im74 there is Xc such that Axc — c, then we have the fact that 
A[Gm + Xc] — c for all m e Mn- Since G is a linear function, there is a linear function 
B : A"" — >^ such that BGm = m for all m e A^„. We define a deterministic encoder 

<^n : M-n Ca{c) and a decoder ^jjn '■ ^ M.n as 

(/7„(m) = Gm + aJc 
^n(2/) = B[xA{c\y) - ajc] 
where Xa is defined as (17). The error probability Error(74, c) is given by 

--(AC). E 

xeCA(c),j/: I ^1 
V'n(l/)^a:: 

We have the following corollary, which is shown in Section VII-D. 
Corollary 2: If r satisfies 

H{X\Y) <r <\og\Xl (23) 

then for any 5 > and all sufficiently large n there are a matrix A & A and a vector c e Im>l 
such that 

i?>log|A'|-r (24) 
Error(^, c) < 5 (25) 

When the supremum on the right hand side of (3) is achieved by X corresponding to the 
uniform distribution, the channel capacity 

C{W) = log \X\ - H{X\Y) (26) 

is achievable with the proposed code by letting r — > H{X\Y). Assuming that X — y — Z 
is a finite field, the capacity 

C{W) = \og\X\-H{Z) (27) 
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for a channel with additive noise Z — {y" — X"}^]^ is achievable with the proposed code 
by letting r ^li{Z). 

Remark 11: In [5, Thoerem 7.2.1], the capacity of a discrete stationary memoryless weakly 
symmetric channel is given as 

C{W) = log — H{row of the transition matrix), 

which is another expression of (26). It should be noted that the formula (26) is valid for 
a weakly symmetric channel and is well-defined as long as \X\ is finite. It should also 
be noted that the capacity (26) for a symmetric output channel (e.g. an additive Gaussian 
noise channel) is achieved by X corresponding to the uniform distribution. For a channel 
with additive noise Z, the channel capacity (27) is derived in [31][14, Example 3.2.1] when 
X = y = Z = {0,1}. Formula (27) is an extension to a general finite alphabet. 

V. Construction of Lossy Source Code 

This section introduces a lossy source code. The idea for the construction is drawn from [20] [21] [24]. 
It should be noted that we assume that a reproduction alphabet A"" is finite set but a source 
alphabet is allowed to be arbitrary (infinite, continuous) set. 

For given r > and R > 0, let {A.,Pa) and {B,pb) be ensembles of functions A : A"" 
ImA and B : A" ImB satisfying 

r — — log llm^l 
n 

R — — log |Im;B|, 
n 

respectively. We fix functions A E A, B e B, and a vector c G Im^ so that they are available 
for constructing an encoder and a decoder. 
Let be defined as 



where //yn is the probability distribution of a source Y" and we assume that the conditional 
probability distribution /xx"|y" is given. We use a constrained random number generator 
to construct an encoder. Let = X^i'^lv) be a random variable corresponding to the 
distribution 

/.^„l^„(x|2/)^^'^-"'-(^-^'=)l^)' ^' (28) 

[O, if x^Ca{c). 
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Fig. 2. Construction of Lossy Source Code 



We define the stochastic encoder : Imi3 and the decoder ^„ : Im,B as 

= SX^(c|2/) (29) 
-0^ (m) = Xab {c,m), (30) 
where we declare an encoding error when //x"|y"(CA(c)|?/) = and xab is defined as 

XAB{c,m) = Biicg max jix^ix'). (31) 

a:'eCAs(c,m) 

The flow of vectors is illustrated in Fig. 2. 

The error probability Error(/l, 5, c, is given as 

Error(>l, B, c,D) = P (d„(V'n(^n(l^")), 1^") > D) , (32) 

where we define dniipni^niu)): u) = oo when //x»|y»(CA(c)|?/) = 0. We have the following 
theorem, where the proof is given in Section VII-E. 
Theorem 3: If r,R > satisfy 

r<H{X\Y) (33) 
r + R>H(X), (34) 

then for any 5 > and all sufficiently large n there are functions A e A, B e B, and a 
vector c e Im^ such that 

Error(A, B, c,D)<P F") > + 5. (35) 
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By assuming that {/i }^=i satisfies 

D{X,Y)<D, 

we have the fact that 

lim P {dn{X'', y*) > D) = 

n— >oo 

from the definition of D{X, Y). Then, by letting n — > oo, 5 — > 0, and r — > H_{X\Y), we 
have the fact that for any (i?, D) close to the boundary of Tl{Y) there is a sequence of 
proposed codes such that 

lim Error(^, B, c,D) = 

n—^oo 

Remark 12: We can find c e Im^l C Im^ satisfying (35) because Error(A, 5,c) = 1 
when Ca(c) = 0. 

Next, we consider a special case of the proposed code, which provides an interpretation of 
the conventional code introduced in [19] [12]. It should be noted that a constrained random 
number generator is unnecessary. 

Let us assume that //x" is the uniform distribution on A"" and ( A, Pa) is an ensemble of 
matrices A : A"" — > A"' satisfying 

r = — log llm^l 
n 

when r > is given. We fix a matrix A e A and a vector c e ImA c Im^ so that they are 
available for constructing an encoder and a decoder. 

Since Ca(0) is a linear space, there is a surjective linear function B : A" ^ Ca(0). We 
use the encoder defined by (29). The rate R of the code is given as 

i?=-log|C^(0)|. 
n 

Furthermore, since B is surjective, there is a bijective linear function x'j^^ : ImA xCa{0) — > 
A"" such that a;'^g(Acc, Bx) — x for all x. We replace the function xab by a;^^ in the 
definition of the decoder (30). Let Error(A, c, D) be the error probability given as 

Error(A, c,D) = P (d„(V'„($n(l^")), > D) . 

We have the following corollary, which is shown in Section VII-F. 
Corollary 4: If r satisfies 

r<H{X\Y), (36) 
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then for any 5 > and all sufficiently large n there are a matrix A e A and a vector c e Im^ 
such that 

R<\og\X\-r + 5 (37) 
Error(^, c,D)<P {d{X'', F") > D) + 5. (38) 

When the boundary of the right hand side of (8) is attained with a general channel W = 
{/xx"|y"}^i such that /ix™ is uniform, for any (i?, D) close to the boundary of Tl{Y) there 
is a sequence of proposed codes such that 

lim Error(^, c, D) = 0. 

VI. Constrained Random Number Generation by Using Sum-Product 

Algorithm 

In this section, we introduce an algorithm for generating random numbers subject to the 
distributions (14) and (28) by assuming that /ix^ and ij,x^\y» are memory less, that is, they 
are given by 

n 

l^x"{x) = Y[i^Xi{xi) (39) 

i=l 

n 

Hxr'\Yn{x\y) = YlfiXiiYii^ilyi) 

i=l 

for each x = {xi, . . . , Xn) and y = {yi, . . . , yn), respectively. In the following, we construct 
a random number generator subject to the distribution defined by 

Ji2ill(2!l_ if xeCA(c) 



/x^„(a;)= <( ^-"(^"('^^^ (40) 
if a; ^ Ca{c) 

for a /ix" given by (39), A, and c e Im^. It should be noted that (14) can be reduced to 
(40) by considering a function {A, S) : A"" ^ Im^ x ImB defined as {A, B)x = {Ax, Bx), 
and (28) can also be reduced to (40) by letting nxi = fJ'Xi\Yii-\yi) for a given y. 

Let us assume that there is a family {Zj}j^j of sets such that Im^ C Xj^jZj. For a 
set of local functions {fj : X^^^^ — )■ Zj}j(zj, the sum-product algorithm [1][17] calculates a 
real-valued global function g on X defined as 

Sa;\{x-i} 
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approximately, where the summention X]a;\{a;i} taken over all x e A"" except for the 
variable Xi and the function /, depends only on the set of variables = {xj}j(zs^. It should 
be noted that the algorithm calculates the global function exactly when the corresponding 
factor graph has no loop. Let TTx^^fjixi) and af.^xii^i) be messages defined as 

j'eJ\{j}:ieSj, 

^3^5^ \{»} fi^^Sj) Y[i'^Sj\{i} '^Xi,-^fj{xi') 
^^''^'"'^^'^ - ^^^^ /j(a;5,) n^'e^AW ^x^'^f,{xi') 
where the summation is taken over all {xi}i^s, ^x^^fj (xi) = 1 when there is no j' E J'\ 
{j} such that i e Sf and af.^xi{xi) = fj{xi)/Ylxi hi^i) when Sj — {i}. The sum-product 
algorithm is performed by repeating the above operations for every message af.^xiixi) and 
'Kxi-^fj (xi) satisfying i e Sj and finally calculating the approximation of the global function 
as 

g{xi) ^ Yl (^fj^xiixi), 

jeJ-.ieSj 

where we assign initial values to nxi^/jixi) and aj^^xiixi) when they appear on the right 
hand side of the above operations and are undefined. 

In the following, we introduce an algorithm for constrained random number generation. 
For each i e {1, . . . , Z}, let : A"'"^'! ^ Z be a function such that 

Ax = {ai{xs^), a2{xs^), ai{xsj), 

where the i-th component of A depends only on the set of variables ajg. = {xj}j^Si- For 
example, when A = (aij) is an Z x n sparse matrix with a maximum row weight w, we have 
X — Z, the set Si defined as 

= {j ^ {^,---,ri} : ^ 0} 

satisfies |iSi| < w for all % e {1, . . . , / }, and aiixs^ is defined as the inner product • x 
of vectors and x. Let x\ = (xj, . . . , Xj), where a;^ is a null string if i > j. Let c = 
(ci, . . . , q) e . Let x(-) be defined as 

{1, if the statement 5" is true 
(41) 
0, if the statement S is false. 

Constrained Random Number Generation Algorithm: 

Step 1 Let A; = 1. 
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Step 2 Calculate the conditional probability distribution 



1 defined as 



(42) 



It should be noted that the sum-product algorithm can be employed to obtain (42), 
where {/xxj}j=fc and {x{o.i{xSi) — q)}-^! are local functions and we substitute 
the generated sequence x\~^ for (42). If x{(^i{'^s^) = Ci) is a constant after the 
substitution of x\^^, we can recede the constant in preparation for the future. 

Step 3 Generate and recode a random number Xk corresponding to the distribution p^^|^fc-i. 

Step 4 If k — n, output X = Xi and terminate. 

Step 5 If for the generated sequence x^ there is a unique x'^^-^ such that x = x^ e Ca{x), 

obtain the unique vector x'^_^_-^^, output x, and terminate. 
Step 6 Let ^ + 1 and go to Step 2. 

Remark 13: We can omit Step 5 if it is hard to execute. 

Remark 14: When A is a linear function with rank I', by checking whether k = I' or not 
at Step 5, we can easily determine whether or not for a given x'l there is a unique x^^^^ such 
that Xi e Ca{x). We can obtain the unique xf,^-^ from x{ by using a linear operation. 

Remark 15: It should be noted that the memoryless condition on X" is not essential for 
the description of the algorithm. The algorithm is well-defined when we use the formula 



algorithm may not find a good approximation in general because the corresponding factor 
graph may have many loops. 

We have the following theorem, which is shown in Section VII-G. 

Theorem 5: Assume that (42) is computed exactly. Then the proposed algorithm generates 
X = subject to the probability distribution given by (40). 

In the following, we consider a situation where we can use a real number uj subject to the 
uniform distribution on [0, 1). We modify the proposed algorithm, where the basic idea comes 
from the interval algorithm introduced in [15] and is analogous to the arithmetic coding [29]. 
It should be noted that only Steps 1 , 3 are modified. 
Interval Constrained Random Number Generation Algorithm: 
Step 1 Let /c = 1 and [9i,9i) = [0, 1). 



n 



Hxn{x) = J]^//^.|^i-i(, 




i=l 



and replace nxiixi) by |JLx^\x\-'^{'^i\^\ ^) for i > 2 in (42). However, the sum-product 
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k-l\ 



Step 2 Calculate the conditional probability distribution Px^^^xk-i defined by (42). 

Step 3 Partition the interval [^^ i, O^^i) into sub-intervals that are labeled corresponding to 

the elements in X, where the sub-interval width is subject to the ratio Px^\x''-'^ ixk\xi 
Let [9^, 9k) be a sub-interval that contains u, that is, uj G [^^j 9k) is satisfied for a 
given <jj. Let Xk be a label that corresponds to the sub-interval [9_^, 9k) and record it. 
Step 4 If k — n, output X = Xi and terminate. 

Step 5 If for the generated sequence there is a unique x^^^^ such that a; = e Ca{x), 

obtain the unique vector x'^_^^, output x, and terminate. 
Step 6 Let /c ^ /c + 1 and go to Step 2. 

From Theorem 5, we have the fact that the probability of selecting ct> G [0, 1) is equal to 
the width of the sub-interval [9,^,, 9k'), which is equal to the probability ijl^„{x) of a generated 
sequence x, where k' is the value of k when the algorithm is terminated. 

It should be noted that we can construct a deterministic code from a stochastic code by 
fixing a random number u) e [0, 1). In fact, by using the random coding argument, we can 
show that there is a random number a; G [0, 1) such that the error probability is sufficiently 
small. This is because, from Theorems 1 and 3, the average error probability with respect to 
the random variable corresponding to a random number on [0, 1) is sufficiently small. 

Remark 16: Instead of a real number u?, we can use a binary random sequence a;2, . . . 
subject to the uniform distribution on {0, 1} by letting u? = O.U!iUJ2 ■ ■ ■ G [0, 1), which 
is the binary expansion of a real number. Since we can estimate //x"(Ca(c)) = l/|Im^| 
approximately and the average entropy of X"(c) is given as 

[HiX (c))J = X: ^ ^^-^^ log — ^ 

= -loglWI, (43) 

the required length of the binary sequence can be estimated approximately as at least H{X'^) — 
log |Im^|. 

VII. Proofs of Theorems 

A. Proof of Lemma 1 
Since 

/(X; Y) > H{X) - H{X\Y) 
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for any {X, Y), we have 

C{W) ^ sup 1{X;Y) 



> sup [H{X) - H{X\Y)] . (44) 



In the following, we prove that 



C{W) < sup [H{X) - H{X\Y)] , (45) 

which completes the proof of the lemma. 

From the definition of C{W), we have the fact that for any 5 > and sufficiently large 
n there is a pair consisting an encoder (p^ : A4„ A*" and a decoder ijjn '■ ^ M.n such 
that 

liminf - log \Mn\ > C{W) - 5 (46) 

n—>oo n 

lim PiMy") ^ M^) = 0. (47) 

n— >oo 

We can assume* that Ain C A"" without loss of generality. Since the distribution ij,m„ of Mn 
is uniform on M.n, we have the fact that 

- log ^ = - log \Mn\ 

> liminf-loglMnl -5 (48) 

n— J-oo fl 

for all X e M-n, 5 > 0, and sufficiently large n. Since 

- log ^ = oo 

for every x ^ A^n, we have the fact that 

— log > lim inf — log \M.n I — 5 

for every x e A"", 5 > and sufficiently large n. This implies that 

lim P ( - log \ttt < liminf - log IMJ - =0. (49) 

n->oo \n llM„{Mn) U J 

*This assumption is used merely so that M = {M„}^i is a general source satisfying M„ € X". It should be noted that 
Mn and {ip„{rri) : m e M„} are different subsets of X" in general. We could define a channel code by a subset Ain 
of A"" as defined in [14][31] instead of introducing an encoder (fi„. We introduce an encoder ipn to consider a stochastic 
encoder. 
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Let M = {Mnj'^^i be a general source. Then we have 



liminf - log \Mn\ -S< H{M) (50) 

n— >-oo n 

from (49) and the definition of H{M). We have 

C{W) < liminf - log \Mn\ + S 

n— >oo n 

< H{M) + 25 

^ H{M) -H{M\Y) + 25 

< sup [H{X) - H{X\Y)] + 25, (51) 

where the first inequality comes from (46), the second inequality comes from (50), and the 
equality comes from (47) and Lemma 7. We have (45) by letting 5 — > 0. ■ 

B. Proof of Lemma 2 
Since 

J(X; Y) < H{X) - H{X\Y) 

for any (X, Y), we have 

f 7(X; Y)<R 

n{Y) = \j{{R,D): _ 

w D(X:Y) < D 



> . 



f HiX) - HiX\Y) < R 

w [ D{X]Y) < D 

In the following, we prove that 

f H{X) - H{X\Y) < R 

'R{Y)(l[j{{R,D): _ (52) 
w y D{X]Y) < D 

which completes the proof of the lemma. 

Assume that {R,D) e 'R-iY). From (6), we have the fact that for all 5 > and all 

sufficiently large n, there is a pair consisting an encoder and a decoder ^„ satisfying (4) 

and (5). Let X" = '0„(^n(^")) e X""- Then we have 

P I -log ^ >R + e\ <P( ilog ^ > -log|M„|+£ 

< 2"'' (53) 
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for any £ > 0, where the first inequality comes from (4), and the second inequaUty comes 
from [14, Lemma 2.6.2] and the fact that the cardinality of the domain of X" is at most 
\M.n\- By letting n ^ cxo, we have the fact that a general source X = {'ijjn{(pn{y"'))}'^=i 
satisfies 

H(X) - H(X\Y) < H{X) 

<R + e. (54) 

By letting £ — > 0, we have 

H(X) - H(X\Y) < R. (55) 

On the other hand, we have 

lim P f4(X",F") > d) =0 

n—^oo \ / 

from (5) by letting n — > oo and 5 — > 0. This implies that 

D(X, Y) < D. (56) 

Then we have 

f H{X)-H{X\Y) <R 

{R,D)e\J{{R,D): 

w ( D{X;Y) < D 

which implies (52). ■ 

C. Proof of Theorem 1 

We omit dependence on n of X and Y when they appear in the subscript of ji. 
From (19) and (20), we have the fact that there is £ > satisfying 

r >H{X\Y) +e (57) 

r + i? < H{X) - e. (58) 

Let Tx C Af" and T^ir C A'" x 3^" be defined as 

rx={x:- log > H{X) - e] (59) 

TxiY =l{x,y): -log < H{X\Y) + e] . (60) 

t n iJ,x\Y{x\y) J 

Assume that {x,y) G Tx\y and XA{Ax\y) ^ x. Then we have the fact that there is x' e 
Ca{Ax) such that x' x and 

fXx\Y{x'\y) > f^x\Y{x\y) > 2-"[^(^l^)+^l. 
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This implies that \Tx\Y{y) \ {x}] r\CA{Ax) ^ 0, where Tx\Y{y) is defined as 

Tx\Y{y) = {x : {x, y) e Tx\y} ■ 

We have 

E;, [X{x^{kx\y) ^ x)] < Pa {{A : [Txiy (l/) \ {x}] n Ca{Ax) ^ 0}) 

. \Tx\Y{y)\ak , ^ 
- |W| 



(61) 



for all {x,y) e Tx|y> where x(") is defined by (41), the second inequality comes from 
Lemma 4, and the third inequality comes from the fact that |T x\Y{y)\ < 2"[^(-^l^)+'^l. We 
have the fact that 



^lixYix,y)x{x/KiAx\y) ^ x) 



x,y 



= l^xYix, y)Ef, [x{x^{^x\y) ^x)\+ ^ ^^xY{x, y)Efi, [x{xA{Ax\y) x)] 

{x,y)eTx\Y {x,y)^Tx\Y 

< 2-"['-^(^l^)-^laA + + lixY{[rx\Yr), 



(62) 



where the last inequality comes from (61). We also have the fact that 

1 



E, 



AB 



Aix(CAB(c, m)) 



|Im^||ImB| 



<E, 



AB 



^ //x(Cab(c, m) n Tx) - 

n 

fJ^x{CAB{c,m)n[rx 
I^x{Tx)Eab Y1 



X) 



|Im^||Imi3| 

c^ , ^x([T 



\lmA\\lmB\ 
l^x(CAB(c,m)r\Tx) 1 



l^xiTx) 



|Im^||ImB| 



+ 2i,xi[rxr) 



< MTx) 



\ 



OA - 1 + [/3a + l]|Im.A||Im;B| max nx{x) 
f^xiTx) 



+ 2i,xi[rxr) 



< V^A - 1 + [/3a + l]2--mX)-r-R-e] + ( [1^] , 



(63) 
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where the second inequaUty comes from Lerrnna 5. Then we have 
£;abc [Error(A, B, c)] 



AB 



E 



c,m: 

/ix"(CAB(c,m))=0 



|Im>l||ImS| 



+ 



E 



HxY[x,y} 



c,m,x,y: 
Mx"(CAB(c,m))>0 
xeCAB{c,m) 



\lmA\\lmB\i^x{CAB{c, m)) 



AB 



E 



\ImA\\ImB\ 



/ix"(CAB(c,m))=0 



+ 



E 



<£^A 



c,m,xeCAB{c,m),y ?f^f''^'\-,^r, 
Mx"(CAB(c,m))>0 

3;GCAB(c,m,) 

XA{c\y}j^x 



Hxy{x, y)x{xf,{kx\y) ^ x) + Ej^b 

.x,y 



1 + 



|Im^||ImS|//x(CAB(c, m)) 



- 1 



E 



,c,m 



IJ'x{Cab{c, m)) - 



|Im>l||Im.B| 



(64) 



-xn 



< 2-"['^-^(^l^)-laA + ^A + I^XYiiTxiYr) 

+ ^JaAB - 1 + [Pab + l]2--mx)-r-R-e] + 2iix{[r 
where c is a random variable corresponding to the uniform distribution on Im^, the first 
inequaUty comes from the fact that 

1 



E 



fixY[x,y) 



c,m,x,y: 
Mx{CAs(c,m))>0 
xeCAB(,c,m) 

^ E 

c,m: 
Mx(CAs(c,m))>0 

= E 

c,m: 

f^x{CAB(c,m))>0 



ImA\\ImB\fXx{CAB{c, m)) 

I^x{Cab{c, m)) 



\lmA\\lmB\i^x{CABic, m)) 

1 



- 1 



I^x{Cab{c, m)) - 



|Im>l||Im,B| 



= E 



fJ-xiCABic, m)) 



1 



|W||Imi3| 



E 



1 



|Im^||Imi3|' 

/ix{CAs(c,m))=0 



(65) 
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and the second inequality comes from (62), (63). From (57), (58), (64) and the fact that 

a^ ^ 1, /3a 0, c^ab -> 1, /^ab ^ 0, iJixi\llxY) ^ 0' I^'XY^ULxyY) ^ as n ^ oo, we 
have the fact that there are functions A e A, B e B, and a vector c e Im.A satisfying (21). 



D. Proof of Corollary 2 
Inequality (24) is shown as 

R=-log\Mn\ 
n 

1 , 

n |lmA| 
> loglA'l -r, 

where the inequality comes from the definition of r and the fact that Im^l C Im^. 

Since jix^ is uniform and for given c e lm.A and m e A^n there is a unique a; e 
Cab{c, m), we have the fact that 

1 _ Atx"(a?) 

\Ca(c)\ \Mn\lJ'X"(CAB(c,m)) 

for all m. Then we have 

Eac [Error(A, c)] < 2-"[--^(^l^)-laA + /3a + /^xy ([Txiy]^) 

/ (66) 

+ ^ftAB - 1 + [^AB + l]2--mX)-r-R-e] + 2/Xx([rx]1, 

from (64). From (23), (66), and the fact that 1, /3a 0, iixyHTxiyY) 0, 

IJ'xiiUxY) — as n — 7> oo, we have the fact that for any 6 > and sufficiently large 
n there are functions A e A, and a vector c e Im^ satisfying (25) for all 5 > and 
sufficiently large n. 

Now, we prove (27) following the proof presented in [31][14, Example 3.2.1]. Assume that 
/xyn|xn is a channel with additive noise Z — {Y^ — X'^}'^^^. Since the channel ij,y^\X" is 
weakly symmetric (see [5, p. 190]), then the reverse channel yUx'i|y" is also weakly symmetric 
when the channel input distribution /^x" is uniform. This implies that H{X\Y) does not 
depend on Y and 

H{X\Y) = H{X\0) = H{-Z) = H{Z). 

We have 

1{X; Y) < H{X) - H{X\Y) 
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< \og\X\ - H(X\Y) 

< \og\X\ -H{Z). 



(67) 



This implies that log \X\ — H{Z) > C{W). On the other hand, the supremum on the right 
hand side of (3) is achieved by assuming that fix^ is the uniform distribution on This 
implies that log \ X\ — H{Z) is the capacity of this channel. ■ 



E. Proof of Theorem 3 

We omit the dependence on n of X and Y when they appear in the subscript of /x. 
From (33) and (34), we have the fact that there is £ > satisfying 

r<H{X\Y)-e (68) 
r + R> H{X) + s. (69) 

Let Tx C A'" and T^iy C X"" x y be defined as 

~ ^ ^ <H{X) + s 

( 1 

r, 



Tx=\x:-\og 

[x,y):-log- ^ 



> H{X\Y)-s} . 



-X\Y — 1 V"^) tf/ ■ -^t) / I \ 

I [ n ijLx\Y{x\y) 

Assume that x e Tx and xab{Ax, Bx) ^ x. Then we have the fact that there is x' e 
Cab{Ax, Bx) such that x' ^ x and 

//x(a;') > /^x(a;) > 2-"[^W+< 

This implies that [Tx \ {x]\ n Cab{Ax, Bx) ^ 0. Then we have 

£^AB \x{x^{kx, Ba;) ^ x)\ < pab {{{A, B) : [Tx \ {x}] n Cab{Ax, Bx) ^ 0}) 



|Im^| 

<2-"['-^(^)-^laAB + /3AB, 



(70) 



where x(-) is defined by (41), the second inequality comes from Lemma 4, and the last 
inequality comes from the fact that \Tx\ < 2"[^(^)"'"^1. We have the fact that 



AB 



^l^x{x)xixAB{kx, Bx) ^ x) 



iix{x)Ef,B [x(a;AB(Aa;, Bx) 7^ a;)] + ^ ijLx{x)Ef^B [x(a;AB(Aa;, Bcc) 7^ cc)] 
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where the last inequality comes from (70). We also have the fact that 



(71) 



X] /^x|y(Tx|y(y)|2/)/^y(2/) 



^/ix|y(CA(c) n [rx\Yiy)Y\y)My 



fix\Y{c^{c)nrxiYiy)\y) i_ 

i^x\Y{Tx\Y{y)\y) 

myi[T^x\Y{y)Y\y)^Y{y) 



+ E^ 



= ^i^x\Y{Tx\Y{y)\y)f^Y{y)EA 

y 



iix\y{C^{c) nrx\Y{y)\y) _ i 
^x\Y{Tx\Y{y)\y) |im^l 



ttAB - 1 + [/5ab + l]|Im^| max iJ.x{x) 

<^,MT.,Ay)\v)Mv)^ 

< \/aA - 1 + [/Ja + l]2-»l«(^l'')-'-«l + 2,l;f |y(trx|y]'), 

where the second inequality comes from Lemma 5. Then we have 
Ef,sc [Error(A,B,c,L>)] 



+ 2y^x{[rxr) 



(72) 



< -S^ABc 



^Y{y) + 



E 



y- 



Mx|y(CA(c)|i/)=0 



x,y: 

Hx\Y{Cf,{c)\y)>Q 
dn{x,y)>D or a!AB(c,Ba;)7^a! 



f^x\Y{x\y)t^Y{y) 
lix\Y{CA{c)\y) 



AB 



E 



c,y- 

IJ'X\y{Ca{c 



\lmA\ 



)\y)=o 



+ 



E 



l^xY{x,y) 



c,x,y: 

xeCA{c) 

Mx|i'(CA{c)|y)>0 
dn{x,y)>D or XAB{t^x,Bx)^x 



1 + 



|Im^|//x|y(CA(c)|7/) 
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<P(d„(X",F'^)>D)+SAB 



^ix\Y{C^{c)\y) - 



X^/^x(a;)x(a;AB(Aa;, Ex) ^ x) 

1 



|Im^| 



; (73) 

+ y^^A - 1 + [/3a + l]2-[^(^l^)— 1 + 2^lxY{[rx\Yr), 

where c is a random variable corresponding to the uniform distribution on Im^, the second 
inequality comes from the fact that 

1 



^ HxY{x,y) 



c,x,y: 

xeCA{c) 

/"x|y(Cyi(c)|y)>0 



|Im^|//x|y(CA(c)|2/) 



< Yl f^x\YiCAic)\y)fiYiy) 



c,y. 

Hx\Y{CA{c)\y)>0 



/^x|y(CA(c)|y)>0 



pix{eA{c)\y) - 



\lmA\nx\YiCA{c)\y) 
1 



\ImA\ 



fxx{CA{c)\y) 



\lmA\ 



^^Y{y) 

Mx|y(CA{c)|y)=0 



E 



(74) 



and the third inequality comes from (71), (72). From (68), (69), (73) and the fact that — > 1, 

0, CKAB 1, /^AB ^ 0, ^ 0, ^ixY{[Jlx\YY) ^ as 77, ^ oo, we have the 

fact that there are functions A e A, B e B, and a vector c e Im^ satisfying (35). ■ 



F. Proof of Corollary 4 

Since a;'^B(c, Sa;) = cc is satisfied for all x, we can substitute 

x{xab{c,Bx) ^x) = Q 

in the derivation of (73) and obtain 

£;ac [Error(A, 



< P (rf(X", F") > + JftA - 1 + [/3a + l]2-"[^(^l^)-'--l + 2^xY{[rx\Yr)- (75) 
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On the other hand, from Lemma 5, we have 

|Ina||CA(c) 



E, 



Ac 



1 



|Ca(c) 



\lmA\ 



< \ a/K - 1 + 



[/3a + 1]|W| 



= ^Q;A-l + [/5A + l]2-"[i°gW 



(76) 



By using the Markov inequality, (36), (75), and the fact that a^ — )> 1, /3a — ^ 0, A*xy([Tx|r]'^) ~^ 
as n — > oo, we have the fact that for any 5 > and sufficiently large n there are functions 
A, and a vector c e Im^ satisfying (38) and 



\Iv^\\Ca{c)\ 



< 1 



(77) 



for sufficiently large n. Then we have the fact that c e ImA c Im^ because the left hand 
side of (77) is equal to 1 when c e Im^ \ ImA. From (77) and the fact that A is a linear 
function, we have 

|W||Ca(0)| |W||C^(c) 



< 2 



and 



i?=-log|CA(0)| 

< lo ^l'^"' 
~ n ^ llm^l 



< log|A'| -r + d, 



for all 5 > and sufficiently large n. 



(78) 



(79) 



G. Proof of Theorem 5 

. Let q'q and g'j^: X ^ Z^ht defined as 



^0 



= ^Y{^^X,{Xj)\[x{o,i{xs^) = Ci) 



X j=l 



i=l 



^k+i i=fc+i «=i 
for a given c = (ci, . . . , q) e Z'. Then we have 

^ n"=fcPx, (a;,-) n!=i x{o,i{xs,) = q) 



(80) 
(81) 
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If the algorithm terminates with k — n at Step 4, we have 



(82) 



g'ni^n) = PXniXn)- (83) 

On the other hand, if the algorithm terminates with k — k' at Step 5, we have 

n I 

g'k' M = px^, {xk') n ^^j") n xi^iixsj = 

n 

where the second equality comes from the fact that for a given there is a unique x'^,,^^ 
such that e ^^(a;). Since (83) is a special case of (84) with k' — n, we assume that the 
algorithm terminates at k = k' in the following. 
Since 

n I 

XI (^i) ^ XI (^i) n ^^i) n ^^'^^(^•sj = q) 

xi x\ X2 j=2 i=l 

= g'o (85) 

and 

n I 

n Z 

^ X n (^j) n (a^sj = q) 

j=fc i=l 
^ ^fc-l(^fc-l) 

for A; > 2, we have the fact that (40) is rephrased as 



(86) 



^U'^i) Y^f^x,_,i^k)g'k{xk) 



g'o ^=2 9k-ii^k-i) 



T-r ^^Xk-,{xk)gk{xk) 

feii g'k-ii^k-i) 
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= IlPx,\x^=^(^k\x1-^): (87) 



fc= 
k' 



k=l 



where the first equality comes from (40), the second equality comes from (80), (84), we 
denote go{xo) = q'q in the fourth equality, the fifth equality comes from (85), (86), and the 
last equality comes from (82). 

Since the algorithm generates a sequence x = subject to 11^=1 Pxfe|xfc=i(^fcki~^)' 
proposed algorithm generates x subject to the probability distribution given by (40). ■ 

Appendix 

We prove the lemmas used in the proofs of the theorems. Some proofs are presented for 
the completeness of this paper. 

A. Lemma Analogous to Fano Inequality 

We prove the following lemma which is analogous to the Fano inequality. It should be 
noted that a stronger version of this lemma has been proved in [16, Lemma 4]. 

Lemma 7: Let (17, V) = {(t/'*, V^"^)}^! be a pair consisting of two sequences of random 
variables. If there is {V'n}^i such that 

lim P(V^„(l^") + t/") = 0, (88) 

n— >-oo 

then 

E{TU\V) = 0. (89) 

Proof: For 7 > 0, let 

g = \ {u,v) log \ I . > 7 1 

S = {{u,v) : ipniv) = u} . 

Then we have 

{u,v)eQnS 
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iln{v)=U 

(,u,v)ee 

f u:tp„{v)=u 

<P(V'n(l^")7^C/") + 2-"^ (90) 

where the first inequality comes from the definition of Q and the last inequality comes from 
the fact that for all v there is a unique u satisfying V'n('y) = u. From this inequality and 
(88), we have 

lim P (-log jrrn\,rn^ > ^) = 

Then we have 

< HiU\V) < 7 

from the definition of H{U\V). We have (89) by letting 7^-0. ■ 

B. Proof of (H3') 

If an ensemble satisfies (H3), then we have 



J2pa{{A:Au^Au'})= J2 Pa{{A:Au^Au'}) 



ueT ueT nT' 

u'eV 



+ E E P^{{A:Au^ Au'}) 



uGT u'€T'\{u}: 

pMA:Au=Au'})<-r^ 



+ E E Pa{{A:Au^ Au'}) 



uGT u'eT'\{u}: 



p^({A:Au=Au'})>^^ 



u&r u'eT'\{u}: ' ' ueT 

p^{{A:Au=Au'})<A 



<\rnr\ + W^ + \r\PA 



< \rnr'\ + ' +min{|ri,iri}/3A (9i) 

for any T and T' satisfying |T| < |T'|. 



inin 
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C. Proof of Lemma 3 
Let 



Pk,u,u' = Pa({A : Au = Au'}) 
Pb,u,u' = Pb{{B : Bu = Bu'}). 
Pab,u,u' = Pab({(A B) : {A, B)u = {A, B)u'}). 



Then we have 



pab,u,u' 

tt'eW"\{ti}: 

n °AB 
PaB,u,u'> |lm[^xB]| 

u'eW"\{u}: 

PA.u.u'PB.u.u'^ |Im^||ImB| 

= ^ Pa.u.u'Pb.u.u' + ^ Pa.u.u'Pb.u.u' 

Pk,u,u'PB,u,u' > |Im^||ImB| ^^AjU.u'^'B.UjU' ^ |Im^||ImB| 

,u,u' — |Im^| 

< PA,u,u'PB,u,u' + PA.u.u'PB.u.u' 
u'eW"\{u}: ti'GW"\{ii}: 

Pa,xi,,u'> \lmA\ Pb,m,ii.'> llmBI 

< X] PA,^,n'+ J] PB^ 
u'eW"\{u}: u'eW"\{ii}: 

= /3a + /Sb 



u.u 



— Pab, 



(92) 



where the first inequality comes from the fact that Im^ x B C ImA x Im,B and A, B 
are mutually independent, and the last inequality comes from the fact that pa,u,u' < 1, 
Pb,u,u' < 1- Since (ttABj/^AB) satisfies (HI) and (H2), we have the fact that {A. x B,Pab) 
has an (aAB, /3AB)'hash property. ■ 

D. Proof of Lemma 4: 
We have 

PA{{A:[g\{u}]nCA{Au)^$}) 

< Pa{{A:Au = Au'}) 

u'eg\{u} 
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<|{«}nie\Ml| + 



IMII^\M 


"A 









+ min{|M|,|^\M|}/3A 



(93) 



where the second inequality comes from (H3') by letting T = {u} and T' = Q\ {u}. 



E. Proof of Lemma 5 
Let Pa,u,u' be defined as 

Then we have 



Pa,u,u' = Pa {{A : Au = Au'}) . 



E. 



Ac 



^ g(w)x(Aw = c) 



^A 



Q{u) J2 Q(^')x(An = Au')E, [x{Au' = c)] 
.ueT u'eT 



E ^(^) E ^(^')PA ({A : Au = Aw'}) 



2^ Q{u')pa. 



ii'er\{u} 

PA,u,«'^"A/|Im^| 



PA,u,u'>o:A/|Im^| 



E 



u'Gr\{tt} 

PA,-u.-u'<"A/|Im^| 



Q(tt')«A 

|W| 



+ 



w'Gr\{w} 

PA,u,«'>"A/|Im^| 



maxQ(u) 



^ QiTfoLK ^ Qijy^A + 1] max„er Q(w) 



|W|2 ' llm^l ' ^^"^^ 

where x(-) is defined by (41), the second equality comes from the fact that the uniqueness 
of the value Avl implies 



|W| 
1 



(95) 



for any A^ A and v! e when the distribution of c is uniform on Im^. Then the lemma 
is shown as 



E, 



g(rnCA(c)) 



1 



Q(r) |W| 



E. 



4- 1 wi 



Q{TnC^{cj) \hnA\ 



QiT) 
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E, 



Ac 



g(rnCA(c))|W| 

QiT) 



1 2 



- 1 




\lmA\ 

Qify- 



■E, 



Ac 



J2Q{u)x{Au = c) 



< 



/ [/3a + l]|Im^| maXuerQiu) 

V"^"'+ W) ' ^ ^ 



where the third equaUty comes from the fact that {CA{c)}ceimA is a partition of and the 
last inequaUty comes from (94). 

F. Proof of Lemma 6 

For a type t, let Ct be defined as 

Ct = {h e : = t} . 

We assume that pa ({^ : = 0}) depends on u only through the type t{u). For a given 
u e Ct, we define 

PA,f = Pa ({A : Au = 0}) . 

We use the following lemma, which is proved for the completeness of the paper. 
Lemma 8 ([21, Lemma 9]): Let (q;a,/5a) be defined by (12) and (13). Then 



a A = \lmA\ maxpA,f 

\Ct\pA,t, 

teH\HA 

where is a set of all types of length n except for the type of the zero vector. 
Proof: We have 



(97) 
(98) 



ueCt: 
Au=0 



u€Ct A:Au=0 



\Ct\p^ 



(99) 
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Similarly, we have 



\Ct\\ur, (100) 



where the last equality comes from the fact that 



PA,t 



= iwr' (101) 



because we can find matrices A to satisfy Au = for a given u e Ct- The lemma 

can be shown immediately from (12), (13), (99), and (100). ■ 
Now we prove Lemma 6. It is enough to show (H3) because (HI), (H2) are satisfied from 
the assumption of the lemma. Since function A is linear, we have 

p^{{A : Au = Au'}) = pa({A : A[u - u'] = 0}) 

= PA,t(u-u') (102) 

Then, for u^ u' satisfying t{u — u') e T^a, we have 

p^{{A : Au = Au'}) = PA,t{u-u') 

< maxpA.t 



(103) 



\lmA\ ' 

where the last inequality comes from (97). Then we have the fact that Pa{{A : Au — Au'}) > 
afi,/\lmA\ implies t{u — u') ^ "Ha- Finally, we have 

PA {{A : Au = Au'}) < Yl 

u'eW"\{u}; ii'GW"\{u}; 

tenXHA u'eu''\{u}: 

t{u-u')=t 



< E ic 

teH\nA 



t\PA,t 



(104) 



where the equality comes from (98). 
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